Attribute Correction - Data Cleaning Using Association Rule and Clustering Methods
نویسندگان
چکیده
منابع مشابه
Outlier Detection Using Clustering Methods: a data cleaning application
This paper describes a methodology for the application of hierarchical clustering methods to the task of outlier detection. The methodology is tested on the problem of cleaning Official Statistics data. The goal is to detect erroneous foreign trade transactions in data collected by the Portuguese Institute of Statistics (INE). These transactions are a minority, but still they have an important ...
متن کاملData Profiling Using Attribute Clustering
Finding trends in database data is hard when presented with data sets containing many attributes (columns). The difficulty is increased when the data is in text fields and may include large summary or remarks fields. This paper discusses an approach that uses attribute level clustering in order to discover trends or profiles in the data. This is different from traditional uses of clustering in ...
متن کاملA Comparison of Categorical Attribute Data Clustering Methods
Clustering data in Euclidean space has a long tradition and there has been considerable attention on analyzing several different cost functions. Unfortunately these result rarely generalize to clustering of categorical attribute data. Instead, a simple heuristic k-modes is the most commonly used method despite its modest performance. In this study, we model clusters by their empirical distribut...
متن کاملMining the Banking Customer Behavior Using Clustering and Association Rules Methods
The unprecedented growth of competition in the banking technology has raised the importance of retaining current customers and acquires new customers so that is important analyzing Customer behavior, which is base on bank databases. Analyzing bank databases for analyzing customer behavior is difficult since bank databases are multi-dimensional, comprised of monthly account records and daily t...
متن کاملData Cleaning Methods
Data Cleaning methods are used for finding duplicates within a file or across sets of files. This overview provides background on the Fellegi-Sunter model of record linkage. The Fellegi-Sunter model provides an optimal theoretical classification rule. Fellegi and Sunter introduced methods for automatically estimating optimal parameters without training data that we extend to many real world sit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: International Journal of Data Mining & Knowledge Management Process
سال: 2011
ISSN: 2231-007X
DOI: 10.5121/ijdkp.2011.1202